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Statistical mechanics is applied to lossy compression using multilayer perceptrons for unbiased 
Boolean messages. We utilize a tree-like committee machine (committee tree) and tree-like parity 
machine (parity tree) whose transfer functions are monotonic. For compression using committee tree, 
a lower bound of achievable distortion becomes small as the number of hidden units K increases. 
However, it cannot reach the Shannon bound even where K — > oo. For a compression using a parity 
tree with K > 2 hidden units, the rate distortion function, which is known as the theoretical limit 
for compression, is derived where the code length becomes infinity. 

PACS numbers: 89.70.+C, 02.50.-r, 05.50.+q 



I. INTRODUCTION 

Cross-disciplinary fields that combine information the- 
ory with statistical mechanics have developed rapidly in 
recent years and achievements in these have become the 
center of attention. The employment of methods de- 
rived from statistical mechanics has resulted in signifi- 
cant progress in providing solutions to several problems 
in information theory, including problems in error correc- 
tion [II [3, pi, H , spreading codes 0, |(J and compression 
codes 0, la S HH • Above all, data compression plays an 
important role as one of the base technologies in many 
aspects of information transmission. Data compression 
is generally classified into lossless compression and lossy 
compression ^1,0, 13 • Lossless compression is aimed at 
reducing the size of message under the constraint of per- 
fect retrieval. In lossy compression, on the other hand, 
the length of message can be reduced by allowing a cer- 
tain amount of distortion. The theoretical framework for 
lossy compression scheme is called rate distortion theory, 
which consists partly of Shannon's information theory 




Several lossy compression codes, whose schemes satu- 
rate the rate distortion function that represents an opti- 
mal performance, were discovered in the case where the 
code length becomes infinity. For instance, Low Density 
Generator Matrix (LDGM) code 0, and using a non- 
monotonic perceptron 0,E3,E1 were proposed. In these 
compression codes, a decoder is first defined to retrieve 
a reproduced message from a codeword. In the encod- 
ing problem, for a given source message, we must find a 
codeword that minimizes the distortion between the re- 
produced message and the source message. Therefore, 



fundamentally, the computational cost of compressing a 
message is of exponential order of a codeword length. It 
is important to understand properties of various lossy 
compression codes saturating the optimal performance 
for the development of more useful codes. 

Since a multilayer network includes a nonmonotonic 
perceptron as a special case, we employ tree-like com- 
mittee machine and parity machine as typical multilayer 
networks 0, E E3 to lossy compression and analyti- 
cally evaluate their performance. 

II. LOSSY COMPRESSION 

Let us start by defining the concepts of the rate dis- 
tortion theory [l4[. Let y be a discrete random variable 
with source alphabet y. We will assume that the alpha- 
bet is finite. An source message of M random variables, 
V — t (y 1 r ' ■ ,y M ) £ y M , is compressed into a shorter 
expression, where the operator * denotes the transpose. 
Here, the encoder describes the source sequence y E y M 
by a codeword s = T{y) E S . The decoder represents y 
by a reproduced message y — Q(s) E y M , as illustrated 
in Fig. ^ Note that M represents the length of a source 
sequence, while N represents the length of a codeword. 
The code rate is defined by R — N/Al in this case. A dis- 
tortion function is a mapping d : y x y ^ K+ from the 
set of source alphabet-reproduction alphabet pair into 
the set of non-negative real numbers. In most cases, the 
reproduction alphabet y is the same as the source alpha- 
bet y. After this, we set y — y. An example of common 
distortion functions is Hamming distortion given by 
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which results in the probability of error distortion, since 
E[d(y,y)] = Ply ^ y], where E and P represent the 
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expectation and the probability of its argument respec- 
tively. The distortion between sequences y,y E y M is 
defined by d(y,y) = J2^=i ^iu^, V^)- Therefore, the 
distortion associated with the code is defined as D = 
E[jjd(y,y)], where the expectation is with respect to 
the probability distribution on y. A rate distortion pair 
(R, D) is said to be achievable if there exists a sequence 
of rate distortion codes (J 7 , Q) with E[jjd(y,y)] < D 
in the limit M — ► oo. We can now define a function 
to describe the boundary called the rate distortion func- 
tion. The rate distortion function R(D) is the infimum 
of rates R such that (R, D) is in the rate distortion re- 
gion of the source for a given distortion D and all rate 
distortion codes. The infimum of rates R for a given 
distortion D and given rate distortion codes (J 7 , Q) is 
called the rate distortion property of (J 7 , Q). We re- 
strict ourselves to a Boolean source y = {0, 1}. We as- 
sume that the source sequence is not biased to rule out 
the possibility of compression due to redundancy. The 
non-biased Boolean message in which each component 
is generated independently from an identical distribu- 
tion P{y^ = 1) = P(yi> = 0) = 1/2. For this sim- 
ple source, the rate distortion function for an unbiased 
Boolean source with Hamming distortion is given by 



where h&ix) = —x\og 2 (x 
binary entropy function. 



R(D) = l-h 2 (D), 

(l-z)log 2 (l 



(2) 
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FIG. 1: Rate distortion encoder and decoder. 




FIG. 2: The architecture of tree-like multilayer perceptrons 
with N input units and K hidden units. 



encoder is to find a codeword s E S N that minimizes the 
distortion between its reproduced message Q(s) and the 
source message y. 

We choose a nonlinear map Q utilizing tree-like mul- 
tilayer perceptrons, i.e., a tree-like committee machine 
(committee tree) and a tree-like parity machine (parity 
tree). Figure [5] shows its architecture. The codeword 
s is divided into iV/i^-dimensional K disjoint vectors 
si, ■ ■ ■ ,sk& S n I k as s = 
unit receives the vector si 
tee tree and the parity tree are a majority decision and a 
parity of hidden unit outputs, respectively. The fj,th bit 
of the reproduced message j/ M is defined by utilizing the 
committee tree as 



t (si, • • • , sk)- The Ith hidden 
The outputs of the commit- 



K 



1 = 1 



(4) 



III. COMPRESSION USING MULTILAYER 
PERCEPTRONS 

To simplify notations, let us replace all the Boolean 
representations {0,1} with the Ising representation 
{1, —1} throughout the rest of this paper. We set y = 
S = y = {1, —1} as the binary alphabets. We consider 
an unbiased source message in which a component is gen- 
erated independently from an identical distribution: 

P(yn = \&{v" - 1) + 1^ + 1), (3) 

for simplicity. First let us define a decoder. We can 
construct a nonlinear map Q : S N — > y M from codeword 
s E S N to reproduced message y = (y M ) E y M . For 
a given source message y = (y^) E y M , the role of the 



where ~ JV(0, 1) are fixed N/K -dimensional vectors 
and the map / : R — > 3^ is a transfer function. Function 
sgn(x) denotes the sign function taking 1 for x > and 
-1 for x < 0. Similarly, the fith bit y^ of the reproduced 
message is also defined by utilizing the parity tree as 



K 



1=1 



(5) 



The decoder Q from the codeword s to the reproduced 
message y = (y^) is described as 

g( S ) = y( S )= t (y 1 ( S ),---,y M ( S )) I 



(6) 



In this framework, the encoder T from the original mes- 
sage y to the codeword s can be written as 



F(y) = aigmind(y,g(s)), 



(7) 
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with respect to the case of both the committee tree 
and the parity tree. Employing the Ising representation, 
where the length of the codeword is infinite, the average 
Hamming distortion can be represented as 



M 



E[d( Vl v)] = 5>- e W)]> 



(8) 



where the function 0(x) denotes the step function taking 
1 for x > and otherwise. Since we assume the unbi- 
ased source message in this paper, we set f{x) — sgn(ir). 

This encoding scheme is essentially the same as a learn- 
ing of the multilayer perceptrons because of a follow- 
ing reason. We first assign the random input vector 



to each bit of the original 



message The encoder must find a weight vector s 
which satisfies input-output relations x^ h-j- j/ m as much 
as possible. Then we use this optimal weight vector s as 
a codeword. Therefore, in a lossless case of D = 0, an 
evaluation of the rate distortion property of these codes 
is entirely identical to the calculation of the storage ca- 
pacity ninn. 



A. Replica symmetric theory of lossy compression 
using committee tree 

1. For general K 

In the lossy compression using the committee tree, we 
obtain average entropy Sct(D, R) as 

K 



Sct(D,R) = e^R-^J(j[Dt 



xln{e-' 3 + (l~ e -' 3 )I]({^}; 2 /)} 



- J Du In 2 cosh \f\u 



(11) 



where 



K \ K 



muhy) = {Tj ^ 1} e ( -v$> J U.H(Qnti), (12 



IV. ANALYTICAL EVALUATION 

We analytically evaluate the typical performance, ac- 
cording to Hosaka et al |9| , for the proposed compression 
scheme using the replica method. The minimum permis- 
sible average distortion D is calculated, when the code 
rate R is fixed. For a given original message y and the 
input vectors {xf }, the number of codewords s, which 
provide a fixed Hamming distortion MD = d(y,y), can 
be expressed as 



AT(D,R) = Tr S\MD;d(y,y(s)) 



(9) 



where S(m; n) denotes Kronecker's delta taking 1 if m — 
n and otherwise. Since the original message y and 
the input vectors {x^ } are randomly generated predeter- 
mined variables, the quenched average of the entropy per 
bit over these parameters, 

S(D, R) = ^< bxJf(D, R) > y , x , (10) 

is naturally introduced for investigating the typical prop- 
erties, where < >y,x denotes the average over y and 
{xf}. We calculate the entropy S(D) by the replica 
method (see Appendix . The rate-distortion region 
can be represented by {(D, R)\S(D, R) > 0}. Therefore, 
a minimum code rate R for a fixed distortion D is given 
by a solution of S(D, R) = 0. 

Note that a minimum code rate R for D = coin- 
cides with a reciprocal of the critical storage capacity of 
a multilayer perceptron, i.e., the critical storage capacity 
a c {= M/N) can be obtained by 5(0, l/a c ) = 0. 



with Q = y/q/(l — q) (see Appendix IA 1|) . For any 
K, we can obtain a minimum code rate R, which gives 
Sct(D, R) = for a fixed distortion D. 



2. For large K 

We concentrate in the following on the simple case of 
large K, where the if-multiplc integrals can be reduced 
to a single Gaussian integral. We assume that the num- 
ber of hidden units K is large but still K -C N . Using 
the central limit theorem, the averaged entropy is given 
by 



S C t(D,R) 



extr ^iT^y* Dt ln{ 

+(l-e-> 3 )H 



-0 



1 - leff J I y 



Du In 2 cosh y qu 
-Rr^D 



(13) 



where q e ff = J Dt[\ — 2H(Qt)] 2 = - sin 1 q and Q e ff = 
V^Z/Al — q e ff) (see Appendix IA 2|) . Figure shows 
that the limit of achievable code rate R expected for 
N — > oo plotted versus the distortion D for K = 1,3 
and K — * oo. For a fixed code rate R, the achievable 
distortion decreases as the number of hidden units K in- 
creases. However, it does not saturate Shannon's limit 
even if in the limit K — > oo. For large K, the EA order 
parameter q, which means the average overlap between 
different codewords, does not converge to zero. Since this 
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FIG. 3: The rate distortion property of lossy compression 
using a committee tree. The limit of achievable code rate 
R expected for N — + oo plotted versus the distortion D for 
K = 1 (dotted line), K — 3 (short dashed line) and K — » oo 
(long dashed line). Solid line denotes rate-distortion function 
R(D) for binary sequences by Shannon. 



means that codewords are correlated, the distribution of 
codewords is biased in S N . Note that a nonzero EA order 
parameter does not mean that the reproduced message 
has a nonzero average due to the random input vector 
which have a zero average. 



B. Replica symmetric theory of lossy compression 
using parity tree 

In the lossy compression using the parity tree, on the 
other hand, we hence obtain averaged entropy Spr(D, R) 
as 

S PT (D,R) = extr^-^(n^) 

xln{e^ + (l-e^)n({^};y)}\ 

/ V 



I 



Du In 2 cosh \J\u — — - ^ 



+R~ 1 I3D 



where 



1 / K 
U({tiy,y) = -(l+y]l[l-2H(Qt l )] 



(14) 



(15) 



For cases utilizing a committee tree and a parity tree, 
only terms £({£;}; y) and H({ti};y) are different. Since 
both the order parameters q and q at the saddle-point 



of i|14|) are less than one, the average entropy can be 
expanded with respect to f7 i=1 [1 — 2H(Qti)](< 1). So- 
lutions of the saddle-point equation derived from the ex- 
panded form of average entropy are obtained as 



r 9 = 0, 

q = 0, 
D = 



(16) 



1 



in the case K > 2 (see Appendix I A 3j) . For K = 1, q > 
holds. Note that for K = 1, a parity tree is equivalent 
to a committee tree. For K > 2, the order parameter q 
becomes zero, namely all codewords are uncorrelated and 
distributed all round in S N . Where K > 2, substituting 
(|16|l into i|14[l . average entropy is obtained as 

S PT (D,R) = -iT 1 In 2 + In 2- R^D In D 

-R- l {\- D)\n{\- D). (17) 

A minimum code rate R for a fixed distortion D and 
K > 2 is given by Spt(D, R) = 0. Solving this equation 
with respect to R, we obtain 



R=l-h 2 (D)=R RS (D), 



(18) 



which is identical to the rate-distortion function for uni- 
formly unbiased binary sources 

However, since calculation is based on the RS ansatz, 
we verify the AT stability to confirm the validity of this 
solution. As the RS solution to lossy compression using 
a parity tree with K = 2 hidden units can be simply 
expressed as (|16[l. the stability condition is analytically 
obtained as 



R> —(1-225)" 

7T 



Rat(D), 



(19) 



where boundary R = Rat(D) is called the AT line (see 
Appendix El . For K > 3, the RS solution does not ex- 
hibit the AT instability throughout the achievable region 
of the rate-distortion pair (R,D). Figure 0] shows the 
limit of achievable distortion D expected for N — > oo 
plotted versus code rate R for K = 1 and K > 2. In the 
case K > 2, the limit of achievable distortion is identical 
to the rate-distortion function. The dash-dotted line in 
Fig. 01 denotes the AT line for K = 2. The region above 
the AT line denotes that the RS solution is stable. For 
K = 2, we found that for distortion 0.126 < D < 0.5, 
Rrs(D) can become smaller than Rat{D). Nevertheless 
this instability may not be serious in practice, because 
the region where the RS solution becomes unstable is 
narrow. 

The annealed approximation of the entropy l|10|l gives 
a lower bound to the rate distortion property. It coin- 
cides with the rate distortion function. According to Op- 
per's discussion pof. the entropy ((Tn|) can be represented 
by the information entropy formally. The annealed in- 
formation entropy can give a upper bound to the rate 
distortion property. However, its evaluation is difficult 
(see Appendix . 
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FIG. 4: The rate distortion property of lossy compression 
using a parity tree. The limits of achievable code rate R ex- 
pected for iV — > oo plotted versus the distortion D for K = 1 
(dashed line) and K > 2 (solid line). Solid line also denotes 
rate-distortion function, which is identical to limit of achiev- 
able distortion for K > 2. Dash-dotted line denotes AT line 
for K = 2. For K > 3, RS solution does not exhibit AT 
instability throughout achievable region. 



V. DISTRIBUTION OF CODEWORDS 

It has already been shown that both compression using 
a sparse matrix and compression using a nonmonotonic 
perceptron also achieve optimal performance known as 
Shannon's limit 0,13- All these schemes and compres- 
sion using a parity tree with K > 2 hidden units becomes 
the common EA order parameter q = 0. In compression 
using a nonmonotonic perceptron, the /ith bit of the re- 
produced message is defined as y M (s) = f(N~ 1 / 2 s ■ x^), 
where / is the transfer function with mirror symmetry, 
i.e., f(—x) = f(x) 0- Due to the mirror symmetry of 
/, both s and — s provide identical output for any x^. 
Hence, the EA order parameter is likely to become zero. 
The transfer function / with parameter k is defined as 
taking 1 for \x\ < k and —1 otherwise. Figure[S]shows the 
relationship between a codeword and a bit of the repro- 
duced message. Figure (a) is the case of compression 
using a nonmonotonic perceptron. 

In compression using a parity tree, on the other hand, 
the /xth bit of the reproduced message is 

r(- S ) = l[sgn(Mxf-(- Sl )) = (-l) K r(s). (20) 
1=1 ^ ' 

For K = 1, i.e., a parity tree is identical to a monotonic 
perceptron, y M (— s) = — y M (s) holds. Here, the EA order 
parameter becomes q > 0. Therefore, the distribution of 
codewords is biased in S . Compression using a parity 
tree with K = 1 hidden unit cannot achieve Shannon's 



limit. Figure[S](b) shows the case of compression using a 
monotonic perceptron, i.e., a committee tree and a K — 
1 parity tree. However, for an even number of hidden 
units K, a parity tree also has the same effect as mirror 
symmetry. 

We will next discuss the case of K > 2. Let V(s) c S N 
be a set of vectors that reversed the signs of an arbitrary 
even number of blocks of a codeword s = (s%, • • • , s^r), 
e.g., *(— Si, — s 2: s 3 , • • ■ , sk) e V(s). The cardinality of 
the set V(s) is 

LA72J 

||V(s)||= J2 KC 2n = 2 K -\ (21) 

n=0 

where \x\ is the largest integer < x. According to JSJ, 
all s e V(s) provide identical output for any a;f . The 
summation of all s 6 V(s) becomes 

S = 4 (..- ,2*- 2 s / + 2 A '- 2 (- S; ), •••) = (). (22) 

sev(s) 

This means that 2 K ~ X vectors with the same distortion as 
codeword s are distributed throughout S N . For instance, 
Fig. [S] (c) shows the distribution of codewords obtained 
by compression using a K — 2 parity tree. The set S N 
is divided by two N — 1-dimensional hyperplanes whose 
normal vectors are orthogonal to each other. For the /ith 
bit of the reproduced message, the normal vectors of hy- 
perplanes are *«,0) and *(0,x^) E R N . FigureEJ(d) 
shows the case of compression using a K = 3 parity tree. 
Here, although the same effect as mirror symmetry can- 
not be seen, nevertheless, EA order parameter q becomes 
zero for the reason mentioned above. This situation is the 
same for K > 4. 

With respect to LDGM code 0, Murayama succeeded 
in developing a practical encoder using the Thouless- 
Anderson-Palmer (TAP) approach which introduced in- 
ertia term heuristically y|. The TAP approach is called 
belief propagation (BP) in the field of information the- 
ory. Hosaka et al applied this inertia term introduced BP 
to compression using a nonmonotonic perceptron \H\ . In 
compression using a parity tree with K hidden units, the 
number of codewords which give a minimun distortion is 
2 K ~ 1 . Therefore, it may become easy to find codewords 
as the number of hidden units K becomes large. But, in 
a practical encoding problem, it may not be easy to use 
a large K since K <g; N is needed. 



VI. CONCLUSION 

We investigated a lossy compression scheme for unbi- 
ased Boolean messages employing a committee tree and 
a parity tree, whose transfer functions were monotonic. 
The lower bound for achievable distortion in using a com- 
mittee tree became small when the number of hidden 
units K was large. It did not reach Shannon's limit even 
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FIG. 5: Relationship between codeword and bit of repro- 
duced message in lossy compression using parity tree with K 
hidden units. Symbol + denotes bit of the reproduced mes- 
sage is 1 and — denotes that it is —1. Set S N is divided by K 
hyperplanes, whose normal vectors are orthogonal each other. 
For K > 2, vectors with same distortion as codeword s are 
distributed throughout S N . (a) a nonmonotonic perceptron, 
q = 0, (b) a K = 1 parity tree, q > 0, (c) a K = 2 parity tree, 
q = 0, and (d) a K = 3 parity tree, q — 0. 



In future work, we intend to evaluate the upper bound 
to the rate distortion property without replica. 
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APPENDIX A: ANALYTICAL EVALUATION 
USING THE REPLICA METHOD 

The entropy S(D, R) can be evaluated by the replica 
method: 

S(D, R) = lim — In < Af n (D, R) >y,x ■ (Al) 

A moment AT n (D, R), which is the number of codewords 
with respect to an n-replicated system, can be repre- 
sented as 



Af n (D,R)= Tr TT S (mD; d(y, y(s a )U , (A2) 

a— 1 v 7 

where s a = *(sj, • • • , s a K ) and the superscript a denotes 
a replica index. Inserting an identity 



in the case where K — > oo. However, lossy compres- 
sion using a parity tree with K > 2 hidden units could 
achieve Shannon's limit where the code length became 
infinity. We assumed the RS ansatz in our analysis using 
the replica method. In using a parity tree with K > 2, 
the RS solution was unstable in the narrow region. For 
K > 3, the RS solution did not exhibit the AT instability 
throughout the achievable region. 

There is generally more than one code with the same 
distortion as a codeword. The EA order parameter, 
which means an average overlap between different code- 
words, need to be zero to reach Shannon's limit like sev- 
eral known schemes which saturate this limit. Therefore, 
it may be a necessary condition that the EA order pa- 
rameter becomes zero to reach Shannon's limit. 

Since the encoding with our method needs exponential- 
time, we need to employ various efficient polynominal- 
time approximation encoding algorithms. It is under way 
to investigate the influence of the number of hidden units 
on the accuracy of approximation encoding algorithms. 



1 = nnf 

a<bl=l J -°° V J 



^ \n(n-\)K/2 



2ni 



x cxp 



/(nil-/' 



a<b I 



a<b 



N 



(A3) 



into this expression to separate the relevant order pa- 
rameter. Utilizing the Fourier expression of Kronecker's 
delta, 



S^MD;d(y 7 y(s a )) 



'■{c+tt) too, 

^P_ p p a (D-d{y,y(s"))) 



(c-7r) 



2m 

Vc e M, 



(A4) 



we can calculate the average moment < M n (D 1 R) >y.x 
for natural numbers n as 
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<M n {D, R) >y iX 



I {w dpa ) I (nn d ^? 6 )ex P iv 

J ^ a ' J \ a <b I ' 

R ~ lln (J (li du ^ Y[e^ tv ^ v >+* v '- u ' nje^ + (1 - e-^)6(y,K}) 

+ ^ ln A cxp fe E - i E E tf* r +i2-^E ^ a 



a<b I 



(A5) 



where Qi is an n x n matrix having matrix elements {qf b } 
and < h(y) > y = Ej,e{-i,i}[|<% ~ 1) + |<% + !)]%)■ 
Function G(y, {«"}) included in the right hand side of 
(IA5|) depends on the decoder (details are discussed in the 
following sections). We analyze a system in the thermo- 
dynamic limit N, M — > oo, while code rate R is kept fi- 
nite. This integral (|A5I) will be dominated by the saddle- 
point of the extensive exponent and can be evaluated via 
a saddle-point problem with respect to (3 a ,q? b and qf b . 
Here, we assume the replica symmetric (RS) ansatz: 



Pa = P, 



ah 



(i 
(i 



q)S a b 

q)$ab 



q- 

q- 



(A6) 



where 6k,k' is Kronecker's delta taking 1 if k = k' and 
otherwise. This ansatz means that all the hidden units 
are equivalent after averaging over the disorder. 



1. Lossy compression using committee tree 
for general K 

In the lossy compression using the committee tree, the 
0(y, {uf}) included in l|A5(l is obtained as 



9(y,K}) = e y^sgn« 



K 



(A7) 



Therefore, we obtain average entropy Sct{D,R) as 

K 



xln{e-^ + (l-e-^({tihy)}) 

1 1 



Du In 2 cosh y qu - 



g(i - ?) 



+R- 1 PD , 



(A8) 



where 



Tt el-y^njUHiQnti), (A9) 
^ n 1 ' ^ 1=1 ' 1=1 



with Q = y/q/(l — q). Utilizing the Fourier expression 
of the step function 6(x) = / °° dX ^ %{X ~ X \ the 
saddle-point equations 



as _ os 

9/3 



dS_ 

dq 



become 



-Dwtanh 2 \flju, 



(A10) 



= 2R- 



K 



i=i 



-(l-e-^({ tl };y) 



D 



e-e + (i-e-e)mi};y), v 

(All) 



K 



where £'({*/}; y) = d^({U};y)/dq. Substituting the 
solutions to the saddle-point equations into I|A8J| . av- 
erage entropy Sct(D,R) is obtained. Thus, for any 
K, we can obtain a minimum code rate R, which gives 
Sct(D, R) = for a fixed distortion D. 



2. Lossy compression using committee tree 
for large K 



We concentrate in the following on the simple case of 
large K, where the if-multiple integrals can be reduced 
to a single Gaussian integral. We assume that the num- 
ber of hidden units K is large but still K <C N . Here, 
the term £({£;}; y) included in (|A8|) does not depend on 
all the individual integration variables ti but only on the 
combination Yli=i[^H(Qti) — 1]. With the central limit 
theorem, the term is given by 



dX 



dX 
2^ 



exp< iXX 



+ i A 2/ J=]T[2tf(Qi i )-l] 

-^(i-^E^w- 1 ] 2 )}- 



(A13) 
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Therefore, we obtain averaged entropy as 
S C t(D,R) = extrf iT 1 / / Dthx{e~ 

0,q,q\ \J 



5(1 - q) 



Du In 2 cosh y qu — 



-Br l [3D 



(A14) 



where q eff = J Dt[l - 2H(Qt)} 2 = f sin" 1 q and the 
saddle-point equations are 

q = J Dutanh 2 ^/~qu, (A15) 

-(1 - e-P)H'(Q eff t) \ 
e-e + (l-e-e)H(g e , f t)/f ° j 

e-P-e-PH(Q eff t) 



q = 2R-U I Dt 
D = ( I Dt 



P + (l- erf>)H(Q eff t) 



with Q eff = y/q ef f/(l - q e ff) and H'(Q eff t) 
dH(Q eff t)/dq. 



3. Lossy compression using parity tree 
for general K 

In the lossy compression using the parity tree, on the 
other hand, the Q(y, {«"}) included in (|A5|) is obtained 
as 



e(y,K}) = eh/J] S gn«) , (A18) 



Hence, we obtain averaged entropy Spt{D,R) as 

K 



S PT (D,R) = c^R-^J (j[D tl 

xl n {e-^ + (l-e-^)n({t ; };y)}\ 

/ i 



Du In 2 cosh y^ - " — 2 ^ 



+R~ 1 (3D 



where 



1 / K 
n({ti};y) = - l + yJJ[l-2fr(Qt,)] 



(A19) 



(A20) 



(jA19(l are less than one, the average entropy Spt(D,R) 

can be expanded with respect to n£Li[l " 2-ff(Q*i)](< 1) 
as 



S PT (D,R) = extrf iT 1 jln 

0,q,q\ { 



1 + 

2 



^ 2m 1 1 + e-P ) 

m— 1 v ' 



1 K 



L>t[l - 2H{Qt)] 2m 
Du In 2 cosh — 



(A21) 



We obtain saddle-point equations using this expanded 
form of the averaged entropy: 



(A17) q = J Du tanh 2 y/q\i, 



(A22) 



00 / 1 _«\ 2m 



y Dt(l - 2H(Qt)) 
x / - 2H(Qt)) 



K-l 



2m— 1 !-e 



D = 



2^(1 -q) 3 / 2 ' 

(A23) 

2e^ / l-e-^ x 2m " ] 



l + e -fi ^ h +e -0\2\ 1 + 

m— 1 x 7 v 



y L>i(l - 2H(Qt)) 



2m 



A' 



(A24) 



For iiT > 2, because of the existence of term [J Dt(l — 
2H(Qt)) 2m } K - 1 in l(Al3)l . solutions to the saddle-point 
equations can become g = g = 0. We can find no other 
solutions except for q — q — by solving ljA22|) - (|A24|) 
numerically for K > 2. Substituting this into (|A24|) . we 
obtain D = e-' 3 /(l + e-' 3 ). 



APPENDIX B: ALMEIDA-THOULESS 
INSTABILITY OF REPLICA SYMMETRIC 
SOLUTION 

1. General case 

The Hessian computed at the replica symmetric 
saddle-point characterizes fluctuations in the order pa- 
rameters /3°, gf h and qf b around the RS saddle-point. 
Instability of the RS solution is signaled by a change of 



For cases utilizing a committee tree and a parity tree, 
only terms T,({ti};y) and H({t[};y) are different. Since sign of at least one of the eigenvalues of the Hessian. Let 
both the order parameters q and q at the saddle-point of A4({f3 a }, {qf b }, {<?; oh }) be the exponent of the integrand 
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of the integral (|A5|) . Equation i|A5(l can be represented 
as 



< Af n (D, R) >y iX 

/(n^)/(nn^r 

J V a ' J V a<b I 

xeMNM({i3 a },{q? b },{tf b }))- 



(Bl) 



We expand M around 0, q and q in 5(3 a , 5qf b and 8qf b 
and then find up to second order 

M({/3 + 5/3 a },U + q? b },{Q + Sq? b }) 
= MmdQ},m) + ^Gu + 0(\\u\\% (B2) 



same and different hidden units, respectively. Because 
of the block form of G, the eigenproblem splits into an 
uncoupled diagonalization of the two matrices: U — V 
and 



G = 



S T 
K l T U+(K-1)V 



(B7) 



The eigenvectors of U — V correspond to fluctuations in 
directions that break the permutation symmetry (PS). 
The eigenvectors of G represent fluctuations that do not 
break this symmetry. The most unstable mode corre- 
sponds to an eigenvector of G that breaks the replica 
symmetry (RS). We can write the eigenvalue equation as 



where 

v = \{6(3 a }, {6qf}, {6qf}, {Sq a K b }, « 6 }), (B3) 

is the perturbation to the RS saddle-point. The Hessian 
G is the following [n + Kn(n — 1)] x [n + Kn(n — 1)] 
matrix: 



G 



/ S T T 
l T U V 
l T V U 



\*T V V 



V 
V 



u) 



(B4) 



where n x n matrix S, n x n(n — 1) matrix T and n(n ■ 
1) x n(n — 1) matrices U, V are 

S = ({S a ' b }), 

T = {{T abc } 1 {f abc }), 

^jjab,cdy ^jjab,cd\ 
f jjab,cdy ^jjab,cd\ 

with 



U 



V = 



(B5) 



S a,b 


= d 2 M/d(3 a d(3 b , 




rpa,bc 


= d 2 M/d(3 a dq bc , 




rpa,bc 


= d 2 M/d(3 a dq bc , 




jjab.cd 


= d 2 M/dqfdqf 1 




jjab^cd 


= d 2 M/dqfdqf d , 




jjab^cd 


= d 2 M/dqfdqf, 




yab.cd 


= d 2 M/dq? b dqf, d 


(1*1 


yab.cd 


= d 2 M/dqfdqf 


(1*1 


yab.cd 


= d 2 M/dqfdqf 


(1*1 



(B6) 



For (/?, q, q) to be a local maximum of VW, it is necessary 
for the Hessian G to be negative definite, i.e., all of its 
eigenvalues must be negative. Matrices U and V contain 
the quadratic fluctuations of the order parameters in the 



with 



Gfj, = A/x, 



f i = t ({e a } ) {V ab },{t"}) 



(B8) 



(B9) 



There are three types of eigenvectors, i.e., /Lt l5 /x 2 and fi 3 
plf. The first has the form: 



ab ~ab 



(BIO) 



Using the orthogonality of fi ± and /x 2 , the second type of 
eigenvector fi 2 nas the form: 



e a = f (1 - n)e, (a = 9), 
' e, (otherwise), 



v 



a b 



|(2-n)?7, (a = or 6 = 0), 

7], (otherwise), 

\(2- n)f), (a = 6 orb = 9), 

fj, (otherwise), 



, (Bll) 



for a specific replica 9. In the limit n — > this eigenvec- 
tor n 2 converges to fi 1 , therefore the eigenvalue of the 
eigenvector fi 2 becomes degenerate with ^'s. 

Similarly, using the orthogonality of /i 2 and fi 3 , the 
third type of eigenvector /x 3 has the form: 



(B12) 

for two specific replicas 9 and fi. In the limit n — ► 0, per- 
turbations keep symmetry of the eigenvectors fi 1 and fj, 2 
across the replicas. Therefore, fi 1 and fi 2 are irrelevant 
to replica symmetry breaking (RSB) but only determines 



o, 






r i(2-n)(3-n)T7, (a = ( 


lb = 




j (a = 9 or a = v or 6 = 


9 or 


6 = 


1 77, (otherwise), 






f i(2-n)(3-n)77, (a = < 
J |(3 -n)^, 


9,b = 




j (a = 9 or a = ^ or b = 


9 or 


6 = 


1 77, (otherwise), 
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the stability within the RS ansatz. Hence, the third 
eigenvector // 3 , which is called the replicon mode, causes 
RSB. The eigenvalue equation G/j, 3 = with respect 
to llBT2l splits into Tfi 3 = and [17 + (K — l)V]fi 3 = 
A3/X3, where fi 3 = *(0,/x 3 ). Therefore, the eigenproblem 
of G is equivalent to that of U + (K — 1)V. 

Let us calculate the elements of U and V. The second 
derivative M by qf related to the U ab ' cd , V ab > cd is 



where 



d 2 M 
dqfdqft 



= Rr 1 < v?vtvf,vf, > UA 



-RT 1 < vfv^ >„,„< vf,vf, > U) «,(B13) 



< s(K» >u,v-- 



duidv 



ie 



'VtQtVi+iVfUi 



3(Kl) n( e ^° + C 1 - ^Xw. K})} 



J (j[du l dv l e-i tv ^ v ' +iV - u ^ nje^* + (1 ~ e-^)6(y,K})| 



(B14) 



for any function ff({«; a })- The second derivative .M by 
qf related to the U ab ' cd ', V ab > cd is 



where 



d 2 M 

Qqabg q cd 



where 



K- 1 < s a s b s c s d > s 

-K- 1 < s a s b > s < s c s d > 5 

0, 



(1 = 0, 

(B15) 



<9({s a }) >s= 



J Dz Tr } .g({ S a })ex P ^^E sa ) 



Dz Tt^ cxp \^^f qz s a 



(B16) 

for any function g({s a }). The second derivative M. by 
g o6 ) ^ab re i ate d to the U ab > cd , V ab ' cd is 



7 


= 7o + (# 


- l)7i» 


7o 




+ R, 


7i 


= P'-2Q' + R', 


P 


jj-ab.ab 




Q 


jjab,ac 


(Mc), 


R 


jjab^cd 


d), 


P' 


yab,ab 




Q' 


yab,ac 


(b ± c), 


R' 


yab,cd 


{a =f= c,b ^ d). 



(B19) 



The line K-f = 1 is called the AT line. Setting K = 0, 
on the other hand, the matrix U + (K — 1)V is equal 
to U - V. When K = 0, inequality = < 1 of 
(|B18|) always holds. Therefore, permutation symmetry 
breaking (PSB) does not occur in this system. 



d 2 M 
Iqfdqp 



zd 



K~ x , (l = l',a = c,b = d), 
0, (otherwise). 



(B17) 



Using Gardner's method [18J, we find that the RS stabil- 
ity criterion is 



2. For lossy compression using a parity tree with 
K = 2 hidden units 



Let us consider the RS stability of lossy compression 
using a parity tree with K = 2 hidden units. Here, 
e(y,{u?}) is given by Q(y,{uf}) = Q(yUi sgn(O), 
therefore solutions to the saddle-point equations are 



< 1, 



(B18) 



0, D 



(B20) 
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Substituting (jB20|) into (|B13|) and (|B15|) . we obtain 



P' = i? _1 4r(l ~2D) 2 , 
P = Q = R = Q' = R' = 0. 



(B21) 



Therefore, using (|B18(I . the RS stability can be obtained 
as 



R> ^(l-2D) 2 = R AT (D). 



(B22) 



This proves ((T^|) . 



3. For lossy compression using a parity tree with 
K > 3 hidden units 

Next, let us consider the RS stability of lossy com- 
pression using a parity tree with K > 3 hidden units. 
Here, the solutions to the saddle-point equations are 
q = q = 0,D = e- /3 /(l + e-' 3 ) a s well as for K = 2. 
Substituting (|B20|) into jETS} and (515)1 . we obtain 



P = Q = R = P' = Q' = R' = 0. 



(B23) 



Since the inequality = < 1 of l|B18|l always 
holds, the RS solution does not exhibit the AT instability 
throughout the achievable region for K > 3. 



APPENDIX C: A LOWER BOUND TO THE 
RATE DISTORTION PROPERTY OF LOSSY 
COMPRESSION USING A PARITY TREE 

In order to derive a lower bound to the rate distortion 
property, an upper bound to the entropy is necessary. 
Using Jensen's inequality, an upper bound to the entropy 
S upper (D,R) is given by 

S(D,R) = ± <\nAf(D,R) > y>x 



<-ln<Af(D,R) > y<x 
= S upper (D,R). 



(CI) 



After a simple calculation, we obtain the upper bound 
to the entropy of lossy compression using a parity tree 
Sp^ per (D. R) as 

Sp Pper (D, R) = In 2 + cxtr (V 1 In 1 + 6 - + pR^D 

= - iT 1 In 2 + In 2 - R- X D In D 

- R-^l- D)\n(l- D). (C2) 

Note that this annealed entropy S^ neal (D,R) is not 
depend on the number of hidden units K. Solving 
SpT neal (D, R) = with respect to R, we obtain 



R=l-h 2 (D). 



(C3) 



This shows that the rate distortion function for uniformly 
unbiased binary sources (J2J can be also derived as a lower 
bound to the rate distortion property of compression us- 
ing a parity tree. 

We next discuss a upper bound to the rate distortion 
property. In order to derive a upper bound to the rate 
distortion property, we need an lower bound to the en- 
tropy. Using Jensen's inequality, an upper bound to the 
entropy S tl PP er (£) j R) ig represented by 

S(D,R) = ^< \nM(D,R) > y<x 



= l/-ln 1 
N\ M{D,R) 



_ j_ / i_ 

" N \Af(D,R) I y x 



C lo 



•(D,R). 



(C4) 



This inequality can be also obtained by an annealed in- 
formation entropy as follows. According to Opper's dis- 
cussion |20| , we first define a function that characterizes 
a version space as follows: 



p(a) = 



6[MD;d(y,y(a)) 



Tr5[MD;d(y,y(s)) 



(C5) 



Since this function p(s) is non-negative and normalized 
to Trs p{s) — 1, it defines a probability with respect to 
s. Therefore we obtain the information entropy per bit 
H(D, R) as 



«(Afl)4(^p(.)ln^ 

N \ P( s )/s,y,x 
> - < P( s ) >s,y,x 



1 

N 
1 



In < Trp(s) 2 > y x 



In 



N \jV(D 1 R)/ yx 
where < g(s) >s= Tr^ p(s)g(s). Using the identity 



(C6) 



1 fO, iiS(MD;d(y,y( S )))=0, 
P[ 1 p(s) \ M(D, R)- 1 \nAf(D, R), otherwise, 

(C7) 

we can easily confirm Ti.(D, R) = S(D, R). 

However, it is difficult to evaluate the lower bound 
S lower (D,R) directly because < AA(Z?,i?) _1 >y,x>< 
Af(D,R) >y l x - This difficulty is caused by a limitation 
of the version space due to the distortion. This limitation 
complicates the probability p(s). 
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